Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system

نویسندگان

Vassilios Digalakis

Dimitris Oikonomidis

Dimitris Pratsolis

Nikos Tsourakis

Christos Vosnidis

Nikos Chatzichrisafis

Vassilios Diakoloukas

چکیده

In this work, we present the creation of the first Greek Speech Corpus and the implementation of a Dictation System for workflow improvement in the field of journalism. The current work was implemented under the project called Logotypografia (Logos = logos, speech and Typografia = typography) sponsored by the General Secretariat of Research and Development of Greece. This paper presents the process of data collection (texts and recordings), waveform processing (transcriptions), creation of the acoustic and language models and the final integration to a fully functional dictation system. The evaluation of this system is also presented. The Logotypografia database, described here, is available by ELRA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

“My Small Slim Greek ASR System” or Automatic Speech Recognition of Modern Greek Broadcast News

In this paper we report on the development of a Modern Greek large-vocabulary continuous-speech recognition system. We discuss lexical modelling with respect to pronuciation generation and examine its effects on word accuracies. Peculiarities of Modern Greek as a highly inflectional language and their challenges for speech recognition are addressed.

متن کامل

Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System

We report on the creation of a Modern Greek broadcast-news corpus as a pre-requisite to build a large-vocabulary continuous-speech recognition system. We discuss lexical modelling with respect to pronuciation generation and examine the effects of the lexicon size on word accuracies. Peculiarities of Modern Greek as a highly inflectional language and their challenges for speech recognition are d...

متن کامل

Issues in Large Vocabulary, Multilingual Speech Recognition

In this paper we report on our activities in multilingual, speaker-independent,large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Eu-rope, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of ...

متن کامل

Automatic diagnosis of recognition errors in large vocabulary continuous speech recognition systems

Automatic diagnosis of recognition errors in large vocabulary continuous speech recognition (LVCSR) systems is addressed. It consists of two steps. The first step is to identify the module that causes recognition errors for every erroneous segment. This statistics points out which modules to be revised. The second step is to analyze the causes of the errors in detail. Specifically, the triphone...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system

نویسندگان

چکیده

منابع مشابه

“My Small Slim Greek ASR System” or Automatic Speech Recognition of Modern Greek Broadcast News

Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System

Issues in Large Vocabulary, Multilingual Speech Recognition

Automatic diagnosis of recognition errors in large vocabulary continuous speech recognition systems

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

عنوان ژورنال:

اشتراک گذاری